Collaborators: Tomo Tanaka, John Eykelenboom.
In our data we distinguish four states, marked by colour: blue, brown, pink, red. For the purpose of the model we merge blue and brown together into one state. Here is an example from an experiment with 57 cells.
The next figure shows the proportions of each colour as a function of time. The curves are smoothed with a running mean over the window of 5 time points.
The model consists of three states and a set of rules.
* A switch, \(t_{2, ref}\) was introduced to the model, selecting the reference for time \(t_2\). In the default position (\(t_{2, ref} = 1\)) the B\(\rightarrow\)P activation time is \(t_0-t_1 + \Delta t_2\). When the switch is set to \(t_{2, ref} = 0\), the activation time is \(t_0 + \Delta t_2\), that is, it can only occur after the nuclear envelope breakdown. The idea of the switch was to separate pink and red curves, as observed in some data sets.
I use a Markov chain approach. The next state is generated from the current state based on rules outlined above. The rates, \(k\), are converted into probabilities over a given time step \(\Delta t\) as \(Pr = 1 - e^{1 - k\Delta t}\). The transition times, \(t_1\), \(t_2 = \Delta t_2\) and \(t_3 = \Delta t_3\), are generated before the simulation starts for a given cell.
The cell timeline is repeated for \(n_{\rm cell}\) times and then the colour proportions are found at each time point.
Note: though we experimented with varying \(t_0\) and a possibility of a “reversed” P\(\rightarrow\)B transition, they are not discussed in the paper. Also, we don’t discuss \(t_{2, ref}\) switch.
Here is an example of the model.
And here is an example of the the model where P\(\rightarrow\)R transition can occur only after \(t_0 = 0\).
There is a shiny app that allows tuning parameters in search for the best solution.
I attempted fitting model to our data. It is not an easy task, as the model is stochastic. After some testing I found that modified BFGS (a quasi-Newton method, also known as a variable metric algorithm, by Broyden, Fletcher, Goldfarb and Shanno, 1970) which allows box constraints (Byrd et al. 1995), gives reasonable results. Again, due to stochasticity of the model, this algorithm often finds false local minima. I run it several times to find the best minimum. It is a crude and time-consuming method, but gives results better than manual tuning.
Fitting is constrained to time points between -50 and 30 min. The minimized quantity is
\({\rm rms} = \sqrt{\sum_{c \in \{B,P,R\}} \sum_i (O_{c,i} - E_{c,i})^2}\)
that is, the square root of the sum of squared residuals over all time points and all colours.
First, we fit the model with \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\) free. To improve the chances of finding the correct global minimum, I run the fit with 10,000 cells and 30 tries.
We suspect that \(t_0\) might not be exactly zero. Hence I fit the model with \(t_0 = 0, -5, -10, -15\) minutes.
Here (and henceforward) I quote the snakemake rule (in Snakefile file) to create these results:
rule fit_scramble_t0
Let’s read and plot the results.
I think the issue here is not \(t_0\) but the fact that the red curve growths is too fast for the model to fully account for it.
rule fit_all
Here are fit results for all conditions with \(t_0\) fixed at zero and four parameters free: \(\tau_1\), \(\tau_2\), \(k_1\) and \(k_2\). The RMS is computed over (-50, 30) min range (though the model is shown on the entire range).
We notice (see later) that \(\tau_2\) is poorly constrained and similar in all conditions. We fix it at 8 min and fit the model with three free parameters.
rule fit_all_3par
RAD21 looks problematic. We want to see if it works with higher \(\tau_2\). Below, there is a comparison between the default 3-parameter fit (\(\tau_2\) fixed at 8 min) and a fit where \(\tau_2\) was fixed at 15 min.
rule fit_RAD21_t15
As we can see the model cannot reproduce RAD21 data because of the excess of pink state at early times. This is consistent with cohesin not been properly established due to RAD21 removal. This data set is beyond the scope of our model.
Here are results from a modified model, where P\(\rightarrow\)R transition can happen only after \(t_0\). By default \(t_0 = 0\) and the results are shown in the left panels below. As you can see, the red curve in the model lags behind the red curve in the data. Hence, I fitted the data again, but fixing \(t_0 = -10\) min this time (right panels). This aligns the red curves a little better.
We can also see that the pink curve is more peaked, as opposed to the smooth rise and decay in the default model.
rule fit_all_ref0
rule fit_all_ref0_t10
It is not easy to find confidence intervals in fit parameters when not only the model is non-analytic, but also stochastic. The only feasible approach it bootstrap, which is computationally expensive. Here I give it a try.
The bootstrap is performed by sampling with replacement from the data and fitting each sample. Then, we look at the distribution of bootstrapped fit parameters.
Running bootstraps on the cluster. This is quite slow!
rule boot_all
Once all is done, we read the results for selected sets.
Here is the distribution of bootstrap result. Each bootstrap gives a set of fit parameters. Figures below show the distribution of fit parameters across all bootstraps.
And here are fit parameters and their 95% confidence intervals:
These are box plots with whiskers encompassing 90% of data:
In this table we show the median, lower and upper 95% CI and the 25th and 75th percentiles of each parameter:
| set | HL1_med | HL2_med | k1_med | k2_med | HL1_lo95 | HL2_lo95 | k1_lo95 | k2_lo95 | HL1_up95 | HL2_up95 | k1_up95 | k2_up95 | HL1_p25 | HL2_p25 | k1_p25 | k2_p25 | HL1_p75 | HL2_p75 | k1_p75 | k2_p75 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| untreated | 12.5 | 6.48 | 0.0654 | 0.0448 | 11.1 | 4.56 | 0.0551 | 0.0338 | 14.5 | 9.92 | 0.0835 | 0.0577 | 11.9 | 5.26 | 0.0612 | 0.0409 | 13 | 7.66 | 0.0709 | 0.0497 |
| scramble | 12.8 | 6.29 | 0.0652 | 0.044 | 11.3 | 4.36 | 0.0562 | 0.0338 | 14.9 | 10.3 | 0.0783 | 0.0589 | 12.2 | 5.14 | 0.0619 | 0.0403 | 13.6 | 7.71 | 0.0701 | 0.0488 |
| NCAPD2 | 11.9 | 7.16 | 0.074 | 0.0195 | 10.4 | 4.5 | 0.0597 | 0.0147 | 13.8 | 17.4 | 0.102 | 0.0311 | 11.4 | 6.49 | 0.0673 | 0.0174 | 12.5 | 9.67 | 0.0812 | 0.0219 |
| NCAPD3 | 7.62 | 5.34 | 0.0394 | 0.0389 | 6.63 | 4.87 | 0.0335 | 0.0301 | 7.78 | 14.8 | 0.0459 | 0.0694 | 7.53 | 5.3 | 0.0373 | 0.0365 | 7.64 | 7.85 | 0.0415 | 0.0416 |
| SMC2 | 7.43 | 4.38 | 0.0487 | 0.0201 | 5.59 | 3.43 | 0.0426 | 0.0143 | 8.18 | 15 | 0.0582 | 0.0354 | 6.85 | 4.2 | 0.0467 | 0.0178 | 7.66 | 5.44 | 0.0519 | 0.0229 |
| RAD21 | 16.7 | 7.74 | 0.0777 | 0.0361 | 13.3 | 4.25 | 0.0531 | 0.0209 | 21.6 | 22 | 0.123 | 0.0892 | 15.5 | 5.68 | 0.0669 | 0.03 | 17.9 | 10.7 | 0.0913 | 0.0443 |
| WAPL48 | 24.4 | 3.79 | 0.0082 | 0.00992 | 14.5 | 2.08 | 0.00576 | 0.00526 | 34.2 | 26.3 | 0.0122 | 0.0239 | 21.2 | 2.08 | 0.00714 | 0.0081 | 28.1 | 10.5 | 0.00934 | 0.0125 |
| TT108 | 13.6 | 8.5 | 0.0746 | 0.0306 | 11.3 | 4.14 | 0.0527 | 0.0215 | 16.9 | 15.4 | 0.13 | 0.0478 | 12.7 | 6.55 | 0.0648 | 0.027 | 14.7 | 11.1 | 0.0929 | 0.0364 |
Here is link to the above table if you need to download it.
The next plot shows the correlation between \(\tau_2\) and \(k_2\) across bootstraps. The red line is a local regression line (LOESS).
Here I perform bootstrap analysis with fixed \(\tau_2=8\) min.
rule boot_all_3par
Here is the distribution of bootstrap result.
And here are fit parameters and their 95% confidence intervals:
These are default box plots with whiskers encompassing 90% of data:
In this table we show the median value of each parameter:
| set | HL1 | k1 | k2 |
|---|---|---|---|
| untreated | 12.6 | 0.0636 | 0.0425 |
| scramble | 12.9 | 0.0648 | 0.0416 |
| NCAPD2 | 11.8 | 0.0759 | 0.0173 |
| NCAPD3 | 7.61 | 0.0397 | 0.0376 |
| SMC2 | 7.31 | 0.0491 | 0.0205 |
| RAD21 | 16.5 | 0.0761 | 0.0307 |
| WAPL48 | 27.2 | 0.00765 | 0.00889 |
| TT108 | 13.8 | 0.0726 | 0.0267 |
Here I perform bootstrap analysis with fixed \(\tau_2=0\) min.
rule boot_all_3par0
Here is the distribution of bootstrap result.
And here are fit parameters and their 95% confidence intervals:
These are default box plots with whiskers encompassing 90% of data:
In this table we show the median value of each parameter:
| set | HL1 | k1 | k2 |
|---|---|---|---|
| untreated | 12.4 | 0.065 | 0.0354 |
| scramble | 12.8 | 0.0643 | 0.0352 |
| NCAPD2 | 11.9 | 0.0748 | 0.0151 |
| NCAPD3 | 7.52 | 0.04 | 0.0317 |
| SMC2 | 7 | 0.0507 | 0.0183 |
| RAD21 | 16.9 | 0.0719 | 0.0267 |
| WAPL48 | 28.1 | 0.0075 | 0.00839 |
| TT108 | 13.8 | 0.0724 | 0.0222 |
Here I create cell-line plots for all data sets using four colours.
PDF files:
| link |
|---|
| untreated |
| scramble |
| NCAPD2 |
| NCAPD3 |
| SMC2 |
| WAPL24 |
| WAPL48 |
| MK1775 |
| MK1775_ICRF193 |
| RAD21 |
| TT103 |
| TT108 |
This work is not included in the paper.
In this Section we consider the distribution of brown or pink boxes. Hereafter, I call them “brown”, for simplicity, but we do take both brown and pink boxes into account.
Note on terminology: I refer to individual boxes in cell tracks as “boxes” or “points”. I refer to chains of consecutive boxes of the same type as “events”.
Here I calculate the density of brown boxes and the distribution of inter-arrival times between them. If brown points are distributed randomly with density \(\lambda\), the intervals between them should follow the exponential distribution with PDF
\(f(d) = \lambda e^{-\lambda d}\)
where \(d\) is the inter-arrival time between consecutive brown points. Because some of the cells have much shorter data tracks, they will bias the inter-arrival distribution towards shorter intervals. The longer distances are missing from these short tracks. To account for this I made a simple model, in which I take cell tracks from the actual data, fill them randomly with brown points and calculate inter-arrival distribution. This is done 100 times to smooth the distribution.
The example below shows the result for scramble. The black bars represent data, the orange points show the predicted theoretical distribution \(f(d) = \lambda e^{-\lambda d}\) and the open blue bars show the simulated random distribution. As we can see, the simulated distribution predicts more short intervals with respect to the theoretical distribution, as expected.
I use a window of [-300, -20] min.
This plot has limited interpretation. Inter-arrival time between individual one-minute boxes assumes that each box is one event. But in reality, brown event can last less or more than one minute. If we see, e.g., four brown boxes next to each other, it is unlikely that this shows four one-minute events. Much more likely, this is one event that lasts four minutes.
Instead, we should consider the distribution of full brown events, short and long. Below, I show the distribution of duration of brown events. Before the durations are calculated, I fill the occasional missing data with random brown or blue points, where brown probability is the same as the overall brown density in the data. This means that this figure is slightly random and the size of bars will change from one instance to another. The differences are rather small, though (no more than a few percent of the bar size).
The blue open bars show the expected distribution of duration of events derived from a random distribution of single brown points (noise). It shows that longer events (3 minutes or longer) are over-represented, by comparison with random noise. Brown occurrences cannot be explained by pure noise, they do form longer non-random “events”.
We can get more insight from the distribution of gaps between the brown events. Here I use the gaps between the events, not taking brown begin/end into account. For example, if “+” is brown/pink and “-” is blue (or red) then this cell track
---+--++++---+-
shows three brown events of length 1, 4, and 1 and two “gap” events of lengths 2 and 3. The blue sequences at the beginning and at the end are ignored, because we don’t know how long they are, we only see part of them.
The figure below shows the distribution of gaps between brows events. The simulated distribution (open bars) is calculated in the following way:
If brown events were random, both data and simulated distributions would be similar. The p-value in the title shows the result from a chi-square test comparing data and simulation. The number “d” in each title is the density of brown boxes (\(\lambda\)).
As we can see, there is an excess of short gaps in comparison to a random distribution. This means that brown events tend to cluster together. In particular, the 1-min gaps are more prominent, suggesting that one brown event might trigger another, immediately after it. On the other hand, this is our timing resolution, so, to some extent, it might be an experimental artifact. For example, if we have a long brown event and one box in the middle is misidentified as blue, than it creates an artificial gap of size 1. If this happens a few times, we get the observed pattern.
Below, are results for all conditions. Note: each plot shows a p-value from a chi-square test between the data and the model shown. These p-values are typically very small. For clarity, I replaced each \(p < 10^{-16}\) with zero.
Here I compare event duration and gap duration distribution for each pair of conditions. I use chi-square test and the null hypothesis is that the distribution does not depend on the sample.
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| untreated |
|
0.48 | 0.26 | 0.49 | 0.19 | 0.88 | 0.021 | 0.49 | 0.85 | 4e-05 | 0.53 | 0.12 |
| scramble | 0.48 |
|
0.0026 | 0.82 | 0.12 | 0.095 | 0.42 | 0.057 | 0.48 | 0.12 | 0.12 | 0.26 |
| NCAPD2 | 0.26 | 0.0026 |
|
0.47 | 0.21 | 0.51 | 3.6e-05 | 0.69 | 0.19 | 1.2e-10 | 0.35 | 0.086 |
| NCAPD3 | 0.49 | 0.82 | 0.47 |
|
0.84 | 0.37 | 0.0076 | 0.38 | 0.7 | 4.8e-05 | 0.73 | 0.15 |
| SMC2 | 0.19 | 0.12 | 0.21 | 0.84 |
|
0.46 | 0.0039 | 0.27 | 0.33 | 1.6e-07 | 0.64 | 0.12 |
| WAPL24 | 0.88 | 0.095 | 0.51 | 0.37 | 0.46 |
|
0.012 | 0.77 | 0.78 | 1e-06 | 0.93 | 0.47 |
| WAPL48 | 0.021 | 0.42 | 3.6e-05 | 0.0076 | 0.0039 | 0.012 |
|
0.0052 | 0.26 | 0.26 | 0.054 | 0.07 |
| MK1775 | 0.49 | 0.057 | 0.69 | 0.38 | 0.27 | 0.77 | 0.0052 |
|
0.28 | 5.6e-05 | 0.72 | 0.41 |
| MK1775_ICRF193 | 0.85 | 0.48 | 0.19 | 0.7 | 0.33 | 0.78 | 0.26 | 0.28 |
|
0.0022 | 0.94 | 0.91 |
| RAD21 | 4e-05 | 0.12 | 1.2e-10 | 4.8e-05 | 1.6e-07 | 1e-06 | 0.26 | 5.6e-05 | 0.0022 |
|
2e-07 | 0.00019 |
| TT103 | 0.53 | 0.12 | 0.35 | 0.73 | 0.64 | 0.93 | 0.054 | 0.72 | 0.94 | 2e-07 |
|
0.39 |
| TT108 | 0.12 | 0.26 | 0.086 | 0.15 | 0.12 | 0.47 | 0.07 | 0.41 | 0.91 | 0.00019 | 0.39 |
|
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| untreated |
|
0.65 | 0.43 | 0.48 | 0.2 | 0.37 | 0.098 | 0.5 | 0.87 | 0.29 | 0.82 | 0.18 |
| scramble | 0.65 |
|
0.16 | 0.32 | 0.0094 | 0.21 | 0.52 | 0.2 | 0.41 | 0.077 | 0.029 | 0.038 |
| NCAPD2 | 0.43 | 0.16 |
|
0.77 | 0.64 | 0.71 | 0.11 | 0.58 | 0.24 | 0.0027 | 0.42 | 0.42 |
| NCAPD3 | 0.48 | 0.32 | 0.77 |
|
0.52 | 0.79 | 0.39 | 0.53 | 0.56 | 0.15 | 0.71 | 0.66 |
| SMC2 | 0.2 | 0.0094 | 0.64 | 0.52 |
|
0.23 | 0.025 | 0.32 | 0.31 | 0.0013 | 0.53 | 0.85 |
| WAPL24 | 0.37 | 0.21 | 0.71 | 0.79 | 0.23 |
|
0.088 | 0.5 | 0.78 | 0.0068 | 0.39 | 0.26 |
| WAPL48 | 0.098 | 0.52 | 0.11 | 0.39 | 0.025 | 0.088 |
|
0.2 | 0.13 | 0.11 | 0.015 | 0.33 |
| MK1775 | 0.5 | 0.2 | 0.58 | 0.53 | 0.32 | 0.5 | 0.2 |
|
0.42 | 0.047 | 0.27 | 0.47 |
| MK1775_ICRF193 | 0.87 | 0.41 | 0.24 | 0.56 | 0.31 | 0.78 | 0.13 | 0.42 |
|
0.25 | 0.48 | 0.3 |
| RAD21 | 0.29 | 0.077 | 0.0027 | 0.15 | 0.0013 | 0.0068 | 0.11 | 0.047 | 0.25 |
|
0.041 | 0.018 |
| TT103 | 0.82 | 0.029 | 0.42 | 0.71 | 0.53 | 0.39 | 0.015 | 0.27 | 0.48 | 0.041 |
|
0.11 |
| TT108 | 0.18 | 0.038 | 0.42 | 0.66 | 0.85 | 0.26 | 0.33 | 0.47 | 0.3 | 0.018 | 0.11 |
|
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| untreated |
|
0.45 | 0.53 | 0.71 | 0.45 | 0.86 | 0.24 | 0.45 | 0.73 | 0.02 | 0.66 | 0.13 |
| scramble | 0.45 |
|
0.35 | 0.94 | 0.26 | 0.16 | 0.58 | 0.14 | 0.64 | 0.86 | 0.42 | 0.36 |
| NCAPD2 | 0.53 | 0.35 |
|
0.56 | 0.56 | 0.66 | 0.26 | 0.76 | 0.53 | 0.09 | 0.46 | 0.3 |
| NCAPD3 | 0.71 | 0.94 | 0.56 |
|
0.95 | 0.48 | 0.13 | 0.15 | 0.69 | 0.13 | 0.68 | 0.27 |
| SMC2 | 0.45 | 0.26 | 0.56 | 0.95 |
|
0.53 | 0.34 | 0.29 | 0.41 | 0.011 | 0.61 | 0.096 |
| WAPL24 | 0.86 | 0.16 | 0.66 | 0.48 | 0.53 |
|
0.39 | 0.6 | 0.78 | 0.012 | 0.84 | 0.13 |
| WAPL48 | 0.24 | 0.58 | 0.26 | 0.13 | 0.34 | 0.39 |
|
0.2 | 0.73 | 0.52 | 0.92 | 0.52 |
| MK1775 | 0.45 | 0.14 | 0.76 | 0.15 | 0.29 | 0.6 | 0.2 |
|
0.45 | 0.079 | 0.24 | 0.5 |
| MK1775_ICRF193 | 0.73 | 0.64 | 0.53 | 0.69 | 0.41 | 0.78 | 0.73 | 0.45 |
|
0.51 | 0.99 | 0.69 |
| RAD21 | 0.02 | 0.86 | 0.09 | 0.13 | 0.011 | 0.012 | 0.52 | 0.079 | 0.51 |
|
0.16 | 0.14 |
| TT103 | 0.66 | 0.42 | 0.46 | 0.68 | 0.61 | 0.84 | 0.92 | 0.24 | 0.99 | 0.16 |
|
0.52 |
| TT108 | 0.13 | 0.36 | 0.3 | 0.27 | 0.096 | 0.13 | 0.52 | 0.5 | 0.69 | 0.14 | 0.52 |
|
These tables show that there is very little discernible difference between conditions. Only TT103 differs from some other conditions in gap distribution. Otherwise, we have no evidence to reject the null hypothesis (but we cannot accept it either!).
WARNING: S and G2 data are calculated on a 2-min grid, while other sets are based on 1-min grid. To compare them, I bin their distributions from a smaller grid. I don’t bin raw data, as there is a problem of the reference frame. Binning 1-2, 3-4, 5-6, … will give different results than 2-3, 4-5, 6-7, …
The distributions are easier to compare. Consider a 1-min grid distribution where we have 50 1-min events and 20 2-min events. I bin them together into 50+20=70 events and compare to the number of 2-min events of S or G2 (which contains all events shorter than 2 minutes).
Depending on how the actual brown events are distributed (their true duration and timing) comparing data based on different time grids might introduce unpredictable biases.
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | S | G2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S | 2.5e-05 | 2.5e-11 | 0.045 | 0.0002 | 1.3e-05 | 0.02 | 2.7e-08 | 0.24 | 0.00016 | 6.2e-20 | 0.0007 | 0.00072 |
|
0.24 |
| G2 | 0.008 | 4.7e-06 | 0.084 | 0.05 | 0.023 | 0.056 | 2.8e-06 | 0.23 | 0.061 | 6.4e-13 | 0.33 | 0.054 | 0.31 |
|
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | S | G2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S | 0.015 | 0.044 | 7.6e-05 | 0.0094 | 0.00021 | 0.003 | 0.18 | 0.035 | 0.077 | 0.62 | 0.00044 | 0.027 |
|
0.26 |
| G2 | 0.00066 | 0.0051 | 1e-08 | 9.2e-05 | 5.6e-09 | 8.3e-05 | 0.051 | 0.0088 | 0.024 | 0.027 | 6.2e-07 | 1.4e-05 | 0.31 |
|
| untreated | scramble | NCAPD2 | NCAPD3 | SMC2 | WAPL24 | WAPL48 | MK1775 | MK1775_ICRF193 | RAD21 | TT103 | TT108 | S | G2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S | 0.00042 | 1.6e-07 | 0.004 | 0.00027 | 0.00095 | 0.029 | 0.0049 | 0.15 | 0.00011 | 1.7e-09 | 8.1e-05 | 0.0011 |
|
0.1 |
| G2 | 0.77 | 0.013 | 0.43 | 0.68 | 0.36 | 0.16 | 0.12 | 0.41 | 0.053 | 0.00032 | 0.27 | 0.07 | 0.3 |
|